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Abstract 

Much forensic inference based upon DNA evidence is made assuming Hardy-Weinberg Equilibrium (HWE) for the 
genetic loci being used. Several statistical tests to detect and measure deviation from HWE have been devised, and 
their limitations become more obvious when testing for deviation within multiallelic DNA loci. The most popular meth- 
ods-Chi-square and Likelihood-ratio tests-are based on asymptotic results and cannot guarantee a good perfor- 
mance in the presence of low frequency genotypes. Since the parameter space dimension increases at a quadratic 
rate on the number of alleles, some authors suggest applying sequential methods, where the multiallelic case is re- 
formulated as a sequence of "biallelic" tests. However, in this approach it is not obvious how to assess the general 
evidence of the original hypothesis; nor is it clear how to establish the significance level for its acceptance/rejection. 
In this work, we introduce a straightforward method for the multiallelic HWE test, which overcomes the aforemen- 
tioned issues of sequential methods. The core theory for the proposed method is given by the Full Bayesian Signifi- 
cance Test (FBST), an intuitive Bayesian approach which does not assign positive probabilities to zero measure sets 
when testing sharp hypotheses. We compare FBST performance to Chi-square, Likelihood-ratio and Markov chain 
tests, in three numerical experiments. The results suggest that FBST is a robust and high performance method for 
the HWE test, even in the presence of several alleles and small sample sizes. 
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Introduction 

The Hardy-Weinberg law is one of most important 
principles in population genetics, and establishes a direct 
relationship between allele and genotypic proportions in a 
population. This law states that in a large population of 
panmictic dioecious organisms with non-overlapping gen- 
erations, the allelic and genotypic frequencies at a locus 
will stay unchanged, provided that migration, mutation, 
and natural selection do not affect that locus. When these 
conditions hold, it is said that the locus is under Hardy- 
Weinberg Equilibrium (HWE). 

This principle was discussed independently by Yule, 
Pearson and Castle (between 1902 and 1904), for some par- 
ticular allele frequencies (see references in Crow and Ki- 
mura, 1972). In 1908, Godfrey Hardy presented the general 
principle for two alleles (Hardy, 1908). This principle was 
called Hardy's law for 35 years, until Stern (1943) called at- 
tention to an article of Weinberg (1908) showing the same 
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principle at the same time and demonstrating its validity for 
multiple alleles (Crow, 1999). 

Since its postulation, several results in population ge- 
netics and much forensic inference based upon DNA evi- 
dence have been based on the assumption that HWE is valid 
for the genetic loci of interest. Some statistical tests to de- 
tect and measure deviation from HWE have been devised, 
and their limitations have become more obvious when test- 
ing for deviation within multiallelic DNA loci. The most 
common approach consists of goodness-of-fits tests, like 
Chi-square and Likelihood-ratio, which are heavily based 
on asymptotic results, and can sometimes lead to false re- 
jection or acceptance of HWE when the sample sizes are 
small and/or some genotype sample frequencies are very 
small (Emigh, 1980). Another approach involves exact 
tests, but is restricted to small dimensions and allele num- 
bers. 

A Bayesian sequential method for multiallelic HWE 
test was proposed by Pereira et al. (2006), who suggested 
reformulating the multiallelic case as a sequence of "bial- 
lelic" tests. In that work, the central component is the Full 
Bayesian Significance Test (FBST), an intuitive Bayesian 
approach which does not assign positive probabilities to 
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zero measure sets when testing sharp hypotheses (Pereira 
and Stern, 1999). Although the sequential method avoids 
the quadratic increase of parameter space dimension with 
respect to the number of alleles, it is not obvious how to as- 
sess the general evidence of the original hypothesis; nor is it 
clear how to establish the significance level for its accep- 
tance or rejection (see DeGroot, 1970). 

In this work, we propose a method for the multiallelic 
HWE test, based on the FBST. FBST has many theoretic 
and practical advantages over other approaches, and it has 
shown to be robust in several high-dimensional problems 
(Lauretto et al, 2008). 



Testing Procedures 

In this section we present three tests used in our com- 
parative study. These and other approaches are described 
by Emigh (1980), Guo and Thompson (1992), Hernandez 
and Weir (1989) and Montoya-Delgado et al. (2001). 

Chi-square goodness-of-fit test 

This test involves calculating the sample chi square 
value, 



3C 



j<i=l 



(4) 



Background 

In this section we introduce some notations, and the 
Hardy- Weinberg Equilibrium (HWE) formula. Let us con- 
sider k alleles A i,A 2 , ...,Ak in a locus. The main interest is to 
assess the population relative frequencies of the genotypes 
AjAj =1,2, ...,k) which we denote by py. As usual in the 
literature (see Hardy, 1908), we assume that the allele fre- 
quencies do not depend on sex and thus are symmetric, that 
is, AjAj is equivalent to AjAj and py =py,. Therefore, the pa- 
rameter of interest is the (lower triangular) matrix of geno- 
type proportions: 

9 = (Qy), with 9„ =p tb Qy = 2 PlJ for 1 <j < i < k. 

We denote by puPi, —,Pk the (unknown) population 
frequencies of alleles A\, A 2 , A k , with p, > 0 and 
Pi = L When the locus is under HWE, the genotype 

proportions are as follows: 



9, = p); 9, =2p, Pj , \<j<i<k. 



(1) 



In order to test the HWE in a locus, one considers a 
sample of n individuals drawn randomly from the popula- 
tion. Such a sample can be presented as the array x, whose 
elements x tj , 1 < j <i<k, are counts of genotypes AjAj. The 
sample size n is n = ^ sj ] x tj , and the sample frequency of 

allele A t is n i =y. ,x H + '5^ i . : x H . Note that =2n, 

since all loci have two alleles and this sum is the number of 
alleles in the whole sample. The sample proportion of allele 
Aj, p. , is given by 



Pi 



In 



(2) 



Assuming that each individual genotype does not de- 
pend on remaining individuals in the same generation, we 
can consider that the genotype frequencies Xy follow a 
multinomial distribution with unknown parameter 9, 



* 0' 

P(x|9) = «!n^r 

j*i=lXy ! 
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Under HWE, this quantity has a chi-square distribu- 
tion with k(k— l)/2 degrees of freedom. 

The Chi-square goodness-of-fit test with continuity 
correction involves calculating the previous statistics, with 
the subtraction in each term of a correction constant c: 



j<i=i 



(\Xjj 



E u 



(5) 



Usually c = 0.5 is the value chosen. 

Likelihood-ratio tests 

The likelihood function, given a sample, follows di- 
rectly from the multinomial distribution presented in 
Eq. (3). A Likelihood-ratio test is constructed by compar- 
ing the likelihood maximized under the hypothesis, L 0 , with 
the maximum likelihood, L\, not constrained by the hypoth- 
esis. For HWE we have 



L, = 



— fl* 



-{j ■ (6) 



with the sample allelic frequencies, p n given by Eq. (2). 
The test statistic 



G 2 



-2 In 



(7) 



is asymptotically distributed as a chi-square distribution 
with k(k- l)/2 degrees of freedom. 

Markov Chain Monte Carlo (MCMC) method 

Proposed by Guo and Thompson (1992), the method 
consists of an adaptation of the Metropolis algorithm, with 
the construction of a Markov chain with equilibrium distri- 
bution matching the genotype probabilities under HWE of 
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samples that have the same allelic counts as the observed 
data. 

Under HWE and conditional on sample allele counts, 
Jt\, ti2, the probability of obtaining the samplex is (see 
Levene, 1949): 

?r(x\n l ,...,n li ) = ' 7J"\ (8) 
(2n)\[[ Xij 

Given the data x, the test evaluates 

P = £PrO0, (9) 

yep 

where p = { y. Pr( y) < Pr(x), x e T 0 } and 

r„ =F(x)-{y: y has the same allele counts as does x}. (10) 

The MCMC algorithm is performed in order to esti- 
mate the probability P in E. (9). Rejection or acceptance of 
the null hypothesis depends on whether P is smaller than a 
pre-specified significance level a. 

Methods 

The Full Bayesian Significance Test (FBST) 

The Full Bayesian Significance Test (FBST) was pro- 
posed by Pereira and Stern (1999) as a coherent and intu- 
itive Bayesian test. It assumes that the hypothesis, H, is 
defined as a subset defined by inequality and equality con- 
straints: 

H:Qe@ H for 0„ = {9 e &\g(Q) < 0 a h(Q) = 0}, 

and (11) 

0c 9T. 

For simplicity, we often use H for & H . FBST is par- 
ticularly focused on precise hypotheses: i.e., 
dim(@//) < dim(@). In this work, f x (9) denotes the poste- 
rior probability density function, given the observation x. 
Bold 0 and 1 denote vectors of appropriate dimensions. 

For the HWE test, the parametric space consists of all 
arrays of genotype proportions 

© ={e = (9,. ),1 <j < i <k\ 9 > 0 a g sw 9, = l}. (12) 

The space on hypothesis is 

H = ® H ={Qe®\3p l ,p 2 ,..., Pk :Q, i = pUi = l,...,k), 

8 iJ =2p l p J U<i),0<P<hpl = l}- 

As previously stated, we consider that the genotype 
frequencies x follow a multinomial distribution, given by 
Eq. (3). Taking as a priori the Dirichlet distribution with pa- 
rameters (1,1 1 ), i.e., a uniform distribution, then the a 

posteriori is a Dirichlet distribution with parameters 
(pen + \,X%\ + l,x 22 + 1, ...,x H .+ 1) which is proportional to 
the likelihood function (DeGroot, 1970): 



f x (Q) = T(n + k)fl «fl9? ( 14 ) 

ys,*=ll (Xjj + I) ; fii= i 

The computation of the evidence measure used on the 
FBST is performed in two steps: 

1 . The optimization step consists of finding the maxi- 
mum (supremum) of the posterior under the null hypothe- 
sis, 9* = argsup„/ x (9), / = /,(9') 

2. The integration step consists of integrating the pos- 
terior density over the Tangential Set, T, where the poste- 
rior is higher than anywhere in the hypothesis, i.e., 

Yv{H) = Pr(9 6 f\x) = f f x (Q)dQ, where 

f (15) 

f ={9e0:/ r (9)>r} 

Ev(H) is the evidence against H, and EV(H) = 1 - Ev(H) is 
the evidence supporting (or in favor of) H. For a better un- 
derstanding of this evidence measure, Figure 1 illustrates 
two examples in the biallelic case, showing the null and tan- 
gential sets (©// and T). Since 9 2 i = 1 - 9n - 9 2 2, the para- 
metric space is fully defined by homozygote proportions, 
0 1 1 and 9 22 - The parameter space corresponds to the area in- 
side the triangle. Sample genotype counts for^n, A 2 \, A 2 2 
and Ev(H) are also shown in each graph. Marker '*' repre- 
sents the point 9* of maximum a posteriori density in the 
constrained space & H , and the level curve tangent to 9* cor- 
responds to T frontier. Intuitively, if the hypothesis set is in 
a region of "low" posterior density (as in the example 1), 
then Tis "heavy" and therefore Ev(H) is "large" (= 0.91), 
meaning "strong" evidence against//. On the other hand, as 
illustrated by the example 2, if hypothesis set is in a region 
of "high" posterior density, then T is a "small" set, and 
hence Ev(H) is "small" (= 0.36), meaning "weak" evidence 
against H. 

For HWE test, the point 9 * = arg sup H f x (9) is given 
as follows. 

Rewriting the posterior pdf under HWE, we have: 

./;«)//)* \\/>: 1 1<2/,./,. )■ . (i6) 

.•=1 j<i 

Taking its logarithm, 

/, (9) = log f x (Q\H) oc £ 2n u log p t + 

t (17) 

X n u lo ^ 2 PiPj ) = h lo g 2 + Z n i lo § Pi ' 

where n i = x tj +Z* =1 JC „, an d h is the sum of sample 
heterozygote frequencies, h = ^ . <( x u . 

By the constraint , p. = \ we have: 
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x = (5, 5, 10) 



£v(fl) = 0.91 




i 1 r 



x = (3,7, 10) 



Ev(H) = 0.36 



I 1 1 1 1 1 

0.0 0.2 0.4 0.6 0.8 1.0 



1 1 1 

0.0 0.2 0.4 0.6 0.8 1.0 

Figure 1 - Parameter space representation for two biallelic examples, x = (5,5,10) and x = (3,7,10), and the respective evidence against the HWE hypothe- 
sis. 



l x (9) oc A log 2 + J] «,. log p i + 



(18) 



n k log 



The gradients of 4(9) are given by: 

8l x n i n k 



(19) 



Hence, the optimal point under HWE is given by the 
vector p* ={p\ ,...,p\_ x ) which satisfies: 



Ap = b; A = 



(20) 



b = 



Summing over all constraints and after some algebra, 
we obtain the following solution: 

* * _ n- 
P =Pi =Pi =t^> i = l,.-.,k. 
2n 

The computation of 9* from p* follows fromEq. (1). 

The integration step may be performed by generating 
a set of M points {9 ( " , 9 (2) , . . . , 9 (M) } with a Dirichlet distri- 
bution with parameter (xu + 1, *2i + 1, ^22 + E *kk + 1) 



and computing the percentage of points with posterior den- 
sity greater than/ : 



Dirichlet(x, l + \,x 2l + l,...,x a + 1), r=\,2,...,M 



Ev{H)- 



(21) 



M 



where I(staf) — 1 if stat is true and 0 otherwise. A more pre- 
cise and efficient Monte Carlo method for the integration 
step is presented by Lauretto et al. (2003). 

As with any significance test, this procedure requires 
the choice of a threshold level, x, for acceptance/rejection 
of the hypothesis at a significance level a. Several alterna- 
tive methods have been developed for establishing this 
threshold: 

• An empirical power analysis, developed by Stern 
and Zacks (2002) and Lauretto et al. (2003), pro- 
vides critical levels that are consistent and also ef- 
fective for small samples. 

• A threshold based on reference sensitivity analysis 
and paraconsistent logic is given by Stern (2004). 

• Pereira et al. (2008) relates the e-value threshold to 
standard p-value thresholds. 

• Madruga et al. (2001) proves the existence of a loss 
function that renders the FBST a true Bayesian de- 
cision-theoretic procedure. 

• An asymptotically consistent threshold for a given 
confidence level was given by Stern (2007), and 
Borges and Stern (2007), which we adopt in this 
work. 

Let us consider the cumulative distribution of the evi- 
dence value against the hypothesis, V (1) = Pr(Ev(H) < x), 
given 9°, the true value of the parameter. Under appropriate 
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regularity conditions, for increasing sample size, we can 
state the following: 

If H is false, 9° g H, then Ev(H) converges (in proba- 
bility) to one, that is, V(x) -> 8(1). 

If H is true, 9° e H, then V(%), the confidence level, is 
approximated by the function Q(t, h, x) = 
Chi2(/-A,Chi2' 1 (/,x)) i where t = dim(0), h = dim(#) and 

Chi2(df, x) denotes the cumulative chi-square distribution 
with df degrees of freedom. 

Hence, to reject //with a significance level a, we can 
set x =2 1 (t,h, 1-a), i.e., setx such that g(?,/z,x) = l-a. 

Results and Discussion 

The numerical experiments used in the performance 
analysis are based on three typical datasets. Two examples 
consist of simulated data used in the literature as 
benchmarks for comparing the performance of competing 
methods, while the third example is from a real dataset. 
These examples are presented in Figure 2, as lower triangu- 
lar matrices containing genotype frequencies. The first 
example is taken from Louis and Dempster (1987) and con- 
sists of a sample of size 45 of genotype frequencies distrib- 
uted in four alleles. The second example is given by Guo 
and Thompson (1992) and consists of a sample of size 30 of 
simulated genotype frequencies simulated under HWE 
with underlying allele frequencies (0.2, 0.2, 0.2, 0.2, 0.05, 
0.05, 0.05, 0.05). The third example is from a rheumatoid 
arthritis (RA) study performed by Wordsworth et al. 
(1992), where two hundred and thirty RA patients were 
geno typed for the HLA-DR locus. The DR4 allele was sub- 
divided into Dw4, Dwl4 and other subtypes. DRX repre- 
sents all non-DRl, non-Dw4, non-Dwl4 alleles. This 
example is used by Chen and Thomson (1999) as a bench- 
mark. 

Our main interest is to compare the error convergence 
of FBST and other methods presented in this work 
(MCMC, Chi-square and Likelihood-ratio) for increasing 



sample sizes. For each sample size n e {30, 50, 1 00, 200} , we 
simulated two collections of 100 samples. The first collec- 
tion consists of samples drawn under HWE, i.e., each sam- 
ple is drawn with a multinomial distribution with parame- 
ters («, 9*), with 9* =arg max eeff / t (9). The second 
collection consists of samples drawn with a multinomial 
distribution with parameters («, 9 </l> ), where 9 (/,) is drawn 
under the posterior distribution. That is, each sampling iter- 
ation is performed in two steps: 

a) draw 9'*' ~ Dirichlet(x n + \,x 2l + 1,.. .,x H + l\ 

where Xy (1 <j<i<k) are the frequencies in the original 
dataset; and 

b) drawX (/,) -Multinomial («, 9 (/ " ). 

The Type I error (rejection rate of a true hypothesis) is 
estimated by the proportion of samples in collection 1 such 
that HWE is rejected, and the Type II error (acceptance rate 
of a false hypothesis) is estimated by the proportion of sam- 
ples in collection 2 such that HWE is accepted. The perfor- 
mance criterion used in this work is the average error, i.e., 
the average of Types I and II error rates. Two standard sig- 
nificance levels, a e {0.01, 0.05}, were used to calibrate 
the asymptotic acceptance/rejection threshold of each 
method. 

A variability measure for the errors was obtained by 
performing 10 batches of simulations and computing the 
mean and standard deviation of average errors across the 
batches. 

Figure 3 presents the average errors for FBST, 
MCMC, Chi-square (with continuity correction) and Like- 
lihood ratio for simulated data based on examples 1 , 2 and 
3. The bar height represents the mean of average errors, and 
the vertical line on the top of each bar is the error bar, repre- 
senting the mean ± one standard deviation of average er- 
rors. 

For simulated data based on examples 1 and 3, the 
best competitors are FBST and Likelihood-ratio test, while 
for simulated data based on example 2, the best competitors 



(a) 



(c) 



0 








3 


1 






5 


18 


1 




3 


7 


5 


2 



(b) 



DR1 


5 








Dw4 


40 


12 






Dwl4 


6 


32 


2 




DRX 


30 


55 


15 


33 




DR1 


Dw4 


Dwl4 


DRX 



3 














4 


2 












2 


2 


2 










3 


3 


2 


1 








0 


1 


0 


0 


0 






0 


0 


0 


0 


0 


1 




0 


0 


1 


0 


0 


0 


0 




0 


0 


0 


2 


1 


0 


0 


0 



Figure 2 - Datasets for numerical experiments, given by Louis and Dempster (1987) (a), Guo and Thompson (1992) (b) and Wordsworth etal. (1992) (c). 
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Figure 3 - Average error rates for FBST, MCMC, Chi-square and Likelihood-ratio for simulated data based on examples from Louis and Dempster 
(1987) (a), Guo and Thompson (1992) (b) and Wordsworth et al. (1992) (c). 



are FBST and MCMC. In every case, we notice that FBST 
is always the best competitor (especially for small sample 
sizes, n < 100) or is very close to it. 

These numerical results suggest that FBST is more 
stable than the competitors discussed in this paper, in the 
sense that it has good comparative performance for differ- 
ent datasets and allele numbers. 

Final Remarks 

We have introduced a simple and straightforward 
procedure for testing deviance from Hardy-Weinberg Equi- 
librium (HWE) in the presence of several alleles. This pro- 
cedure was implemented in C language, and integrated into 
a system for parentage testing developed with FAPESP 
support, where it is applied in the selection of loci to be 
used for parentage testing. Further details of this project can 



be found at http://watson.fapesp.br/PIPEM/Pipel3/genetl. 
htm. Currently, the routine is available by request to the 
corresponding author. 
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